Search CORE

31 research outputs found

An introductory statistical study of Hungarian word order

Author: Nemeskey Dávid Márk
Publication venue
Publication date: 01/01/2022
Field of study

Hungarian is often cited as a language with free word order. While this is not strictly true, the rules that govern the sentence structure are derived from pragmatics and are thus much more flexible than they are for analytical languages such as English. This paper presents an introductory statistical study into Hungarian word order. We report the order of verbal arguments in simple sentences in two corpora: the Hungarian Wikipedia and TrendMiner. An experimental method for ordering adjectives in noun phrases is also presented

University of Szeged

Introducing huBERT

Author: Nemeskey Dávid Márk
Publication venue
Publication date: 01/01/2021
Field of study

This paper introduces the huBERT family of models. The flagship is the eponymous BERT Base model trained on the new Hungarian Webcorpus 2.0, a 9-billion-token corpus of Web text collected from the Common Crawl. This model outperforms the multilingual BERT in masked language modeling by a huge margin, and achieves state-of-the-art performance in named entity recognition and NP chunking. The models are freely downloadable

University of Szeged

Automatically generated NE tagged corpora for English and Hungarian

Author: Nemeskey Dávid Márk
Simon Eszter
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2012
Field of study

Supervised Named Entity Recognizers require large amounts of annotated text. Since manual annotation is a highly costly procedure, reducing the annotation cost is essential. We present a fully automatic method to build NE annotated corpora from Wikipedia. In contrast to recent work, we apply a new method, which maps the DBpedia classes into CoNLL NE types. Since our method is mainly language-independent, we used it to generate corpora for English and Hungarian. The corpora are freely available

CiteSeerX

SZTAKI Publication Repository

Egy emBERT próbáló feladat

Author: Nemeskey Dávid Márk
Publication venue
Publication date: 01/01/2020
Field of study

Az utóbbi egy-két évben a mély, kontextuális szóbeágyazások kiszorították a hagyományos, kézzel összeállított feature halmazokat a legtöbb nyelvi feladatban. Ennek ellenére a magyar nyelvfeldolgozó rendszerek (e-magyar, magyarlanc) még mindig a hagyományos, kézi feature-ökkel dolgoznak. A cikkben bemutatjuk az emBERT modult,amely a transformers könyvtár segítségével lehetővé teszi kontextuális szóbeágyazás-alapú osztályozók integrálását az e-magyar rendszerbe. A modult főnévi csoport- és névelemfelismerésre tanítottuk fel. A modellek mindkét feladaton javítanak az eddigi legjobb eredményeken

SZTAKI Publication Repository

University of Szeged

Detecting Optional Arguments of Verbs

Author: Kornai András
Nemeskey Dávid Márk
Recski Gábor András
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2016
Field of study

SZTAKI Publication Repository

Identification of Disaster-implicated Named Entities

Author: Kornai András
Nemeskey Dávid Márk
Ács Judit
Publication venue
Publication date: 01/01/2017
Field of study

SZTAKI Publication Repository

Tagmondatokra bontás és NP-chunking függőségi alapon

Author: Dömötör Andrea
Nemeskey Dávid Márk
Publication venue
Publication date: 01/01/2023
Field of study

Ebben a cikkben a tagmondatokat és a köztük lévő kapcsolat típusát a függőségi elemzés mintázataiból kíséreljük meg meghatározni. Mivel ennek a feladatnak a teszteléséhez még nincs gold sztenderd adatunk, a módszerünket kipróbáltuk egy másik feladaton, az NPchunkingon is. Ez utóbbi kiértékelésénél nehézséget okozott, hogy az elvben gold sztenderd korpuszok több hibát is tartalmaztak, mind a függőségi elemzésben, mind az NP-chunkingban. Mindezekkel együtt 89%-os f-score-t értünk el, ami ugyan elmarad a state-ot-the-arttól, de abból a szempontból mégis ígéretes, hogy ezt az eredményt egy egyszerű szabályrendszerrel értük el. Ez alapján a függőségi elemzés mintaillesztése további kutatásra érdemes módszer lehet a hasonló feladatokban

University of Szeged

Szemantikus névelem-azonosítás magyar nyelvű szövegeken (a HuWikifier bemutatása)

Author: Nemeskey Dávid Márk
Palkó Gábor
Publication venue: 'HUNGARNET'
Publication date: 01/01/2022
Field of study

Repository of the Academy's Library

Building word embeddings from dictionary definitions

Author: Nemeskey Dávid Márk
Recski Gábor András
Ács Judit
Publication venue: Research Institute for Linguistics, Hungarian Academy of Sciences (RIL HAS)
Publication date: 01/01/2017
Field of study

SZTAKI Publication Repository

Evaluating multi-sense embeddings for semantic resolution monolingually and in word translation

Author: Borbély Gábor
Kornai András
Makrai Márton
Nemeskey Dávid Márk
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

SZTAKI Publication Repository

Repository of the Academy's Library